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Abstract 

We give a quantum algorithm for evaluating a class of boolean formulas (such as NAND trees 
and 3-majority trees) on a restricted set of inputs. Due to the structure of the allowed inputs, 
our algorithm can evaluate a depth n tree using O(n 2+logw ) queries, where u> is independent of 
n and depends only on the type of subformulas within the tree. We also prove a classical lower 
bound of 

^(loglogn) querieSi 

thus showing a (small) super-polynomial speed-up. 

1 Introduction 

In problems where quantum super-polynomial speedups are found, there is usually a promise on 
the input. Examples of such promise problems include Simon's algorithm p3], the Deutsch-Jozsa 
algorithm [5j and the hidden subgroup problem [9] (of which Shor's factoring algorithm is a special 
case [T3]). 

In fact, Beals et. al [3] show that for a total boolean function (that is, without a promise on 
the inputs), it is impossible to obtain a super-polynomial speedup over deterministic, (and hence 
also over probabilistic), classical algorithms. Expanding on the result, Aaronson and Ambainis 
showed in [2] that no super-polynomial speedup over probabilistic classical algorithms is possible 
for computing a property of a function / that is invariant under a permutation of the inputs and 
the outputs, even with a restriction on the set of functions / considered. 

In this paper, we take a quantum algorithm for a total boolean formula, which by [3j can- 
not attain a super-polynomial speed-up, and show how to restrict the inputs to a point that a 
super-polynomial speed-up over probabilistic classical algorithms can be attained. While super- 
polynomial speed-ups have been attained for boolean formulas (for example, standard problems 
such as Deutsch-Josza could be written as a boolean formula) we don't know of other examples 
that restrict the inputs to an extant problem, and which can be written naturally as a straightfor- 
ward composition of simple boolean gates. 

The total boolean formulas we consider are boolean evaluation trees, such as the NAND tree, 
which has a quantum algorithm due to Far hi et. al. [BJ. We show that the existing quantum 
algorithm of Reichardt and Spalek |10j for total boolean evaluation trees (with a small tweak) 
achieves a super-polynomial speed-up on our restricted set of inputs. We choose our allowed set of 
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inputs by closely examining the existing quantum algorithm for total functions to find inputs that 
are "easy" for the algorithm. 

While the restrictions we make on the domain are natural for a quantum algorithm, they are 
not so natural for a classical algorithm, making our bound on classical query complexity the most 
technical part of this paper. We consider an even more limited restriction on the inputs for the 
classical case, and show even with the added promise, any probabilistic classical algorithm fails with 
high probability when less than a super-polynomial number of queries are used. The additional 
restriction considered in our classical proof leads to a problem similar to a problem considered 
by Bernstein and Vazirani called Recursive Fourier Sampling (RFS, also known as the Bernstein- 
Vazirani Problem) [3]. RFS is another example of a problem that acheives a super-polynomial 
speed-up. Extensions and lower bounds to RFS have been considered in [HO [8]. We will describe 
the connections and differences between our problem and RFS later in this section. 

Our problem is to consider a restricted set of inputs to a boolean evaluation tree. An evaluation 
tree for a boolean function / : {0, 1} C — > {0, 1} is a complete c-ary tree T of depth n where every 
node of T is assigned a bit value. In general, the leaves can have arbitrary values, and for every 
non-leaf node v we have 

Val(v) = f{Val( Vl ),...,Val{v c )). 

Here Val(v) is the bit value of v, and v\, . . . v c are the children of v in T. We also sometimes say 
that a node v corresponds to the function / evaluated at that node. The value of the root is the 
value of the tree T. We want to determine the value of T by querying leaves of T, while making as 
few queries as possible. 

For most functions /, the restriction on inputs is a bit artificial. For the NAND function, the 
restriction can be given a slightly more natural interpretation. We will sketch this interpretation 
here to give some intuition. It is known that NAND trees correspond to game trees, with the value 
at each node denoting whether the player moving can win under perfect playQ Each turn, a player 
moves to one of two nodes, and if both nodes have the same value, neither move will change the 
game's outcome. However, if the two nodes have different values, the choice is critical; one direction 
will guarantee a win and the other a loss (under perfect play). Then our restriction in the case of 
the game tree is that for any perfect line of play (starting from any node in the tree) , the number 
of such decision points is limited. Farhi et al. [6] showed that total NAND trees can be evalutated 
in 0{y/N) quantum queries, where N is the number of leaves in the tree (corresponding to the 
number of possible paths of play in the game tree model). This is a polynomial speedup over the 
best classical algorithm, which requires Q(N' 753 ) queries [12J. One would expect that a tree with 
few decision points would be easy to evaluate both quantumly and classically, and we will show 
that this is indeed the case. 

For the NAND/game tree, suppose we only allow inputs where on every path from root to leaf, 
there is exactly one decision point, and on every path, they always occur after the same number of 
steps. Then the oracle for this game tree is also a valid oracle for the 1-level RFS problem (with 
some further restrictions on whether at the decision point moving right or left will cause you to win) . 
Recall that in the 1-level RSF, you are given a function / : {0, l} n — > {0, 1} such that f(x) = x ■ s 
where s is a hidden string. The goal of the problem is to output g(s), where g : {0, l} n — > {0, 1} 
is another oracle you are given access to. For general RFS, this 1-level problem is composed, and 
you are given different oracles g c for each instance of the basic problem. If we solve the NAND 
tree with one decision point on each path, all at the same level, then we are essentially solving 

1 For more on this correspondence, see Scott Aaronson's blog, Shtetl- Optimized, "NAND now for something com- 
pletely different," http://www.scottaaronson .com/blog/?p=207. 
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the 1-level RFS problem, but instead of plugging s into an oracle, the output of the NAND tree 
directly gives PARITY(s ra ), where s n is the last bit of s. 

While our problem is related to RFS, there are some significant differences. First, the analogous 
basic problem in our case is most naturally written as a sequence of boolean gates, with the 
restriction on inputs formulated in terms of these gates. Second, in our problem, the output of 
each level is a bit that is then applied directly to the next level of recursion, rather than using 
the unnatural convention of oracles for each level of recursion. This lack of oracles makes proving 
the classical lower bound harder, as a classical algorithm could potentially use partial information 
on an internal node. Finally, this composed structure is only apparent in the classical bound; 
our quantum algorithm applies to inputs such that the problem can't be decomposed into discrete 
layers, as in RFS. 

We will in general consider evaluation trees made up of what we call direct boolean functions. 
We will define such functions in Sec. [2j but one major subclass of direct boolean functions is 
threshold functions and their negations. We say that / is a threshold function if there exists h such 
that / outputs 1 if and only if at least h of its inputs are 1. So the NAND function is a negation of 
the threshold function with c = h = 2. A commonly considered threshold function is the 3-majority 
function (3-MAJ) with c = 3 and h = 2. 

We will now describe our allowed inputs to the evaluation tree. We first classify the nodes of 
the tree as follows: for threshold (and negation of threshold) functions, each non-leaf node in the 
tree is classified as trivial if its children are either all O's or all l's, and as a fault if otherwise^] 
So from our NAND/game tree example, decision points are faults. Note trivial nodes are easy to 
evaluate classically, since evaluating one child gives the value at the node if the node is known to 
be trivial. It turns out that they are also easy to evaluate quantumly. We next classify each child 
node of each non-leaf node as either strong or weak. If the output of a threshold function is 1 , then 
the strong child nodes are those with value 1, otherwise they are those with value 0. If a node is 
trivial then all children are strong. Classically, the strong child nodes alone determine the value at 
the node, and we will see that they matter more in computing the cost of the quantum algorithm 
as well. 

Our promise is then that the leaves have values such that the tree satisfies the k-faults condition: 

Definition 1.1. (fc-fault Tree) Consider a c-ary tree of depth n, where throughout the tree nodes 
have been designated as trivial or fault, and the child nodes of each node have been designated as 
strong or weak in relation to their parent. All nodes (except leaves) have at least 1 strong child 
node. For each node d in the tree, let Gd be the set of strong child nodes of d. Then to each node 
d, we assign an integer n(d) such that: 

• n{d) = for leaf nodes. 

• n(d) = maxfogGd K (b) if d is trivial. 

• Otherwise n(d) = 1 + maxf, g G d n(b). 

A tree satisfies the k-faults condition if n{d) < k for all nodes d in the tree. 

In particular, any tree such that any path from the root to a leaf encounters only k fault nodes is 
a fc-fault tree. 

The main theorem we prove is 

2 In Section |5j we will define trivial and fault nodes for non-threshold direct functions, and also see that it is 
possible for other inputs to be trivial as well, but the all O's and all l's inputs are always trivial. 
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Theorem 1.2. (Main Result) Given a k-fault depth n tree with each node evaluating a fixed direct 
boolean function fjj, we can create an algorithm based on a span program corresponding to fr> 
that evaluates the root of the tree with O(n 2 oj k ) queries, for some constant ui which depends only 
on f D . 

A formula for u will be given in Section [2] 



To prove the most generalized version of Theorem 1.2 we use the span-program based quan- 
tum algorithm of Reichardt and Spalek |10j . However, in Appendix |a| we show that the original 
algorithm of Farhi et al. [6j obtains a similar speedup, with the correct choice of parameters. 

We use the quantum algorithm of Reichardt and Spalek |10j, which they only apply to total 
boolean evaluation trees (for a polynomial speed-up). With our promise on the inputs, a tweak 
to their algorithm gives a super-polynomial speed-up. Their algorithm uses phase estimation of a 
quantum walk on weighted graphs to evaluate boolean formulas. This formulation requires choosing 
a graph gadget to represent the boolean function, and then composing the gadget many times. The 
optimal gadget given the promise on the inputs may be different from the one used in their original 
paper. While the original gadget is chosen to optimize the worst case performance on any input, we 
should choose a possibly different gadget which is very efficient on trivial nodes. This may increase 
the complexity at faults, but this increase is bounded by the fc-faults condition. 

We also present a corresponding classical lower bound: 

Theorem 1.3. (Classical Lower Bound) Let B be a (classical) randomized algorithm which finds 
the value of a depth n tree composed of direct boolean functions, satisfying the k faults condition 
for some k < polylog(n). If B is correct at least 2/3 of the time on any distribution of inputs, then 
the expected number of queries it makes is at least 

Q ({log(n/k)) k ) = Q ( 2 klo ^°s( n / k ) 



When k = logra the number of queries is n^ loglogra \ whereas the quantum running time is 
<3(nV°s n ) = 0(n 2+lo ^ w ), which gives the speedup. 

In Section [2] we will describe our quantum algorithm for k- fault direct boo lean evaluation trees, 
based on the algorithm of Reichardt and Spalek [10] , and prove Theorem 



1.2 



In Section 3 we will 



sketch the proof of Theorem 1.3 for the case of the NAND tree. Full details of the lower bound 



proof can be found in the appendix. 



2 Quantum Algorithm 

2.1 Span-Program Algorithm 

Our algorithm is based on the formulation of [10], which uses span programs. In this subsection, 
we will define span programs and the witness size, a function that gives the query complexity of an 
algorithm derived from a given span program. 

Span programs are linear algebraic ways of representing a boolean function. We will define 
direct boolean functions, which are the functions we use in our evaluation trees, based on their 
span program representations. (For a more general definition of span programs, see Definition 2.1 
in HD|). 

Definition 2.1. (Direct Boolean Function, defined in terms of its span program representation, 
adapted from Definition 2.1 in [10]) Let a span program Pp, representing a function fo(x), x = 
(xi, . . . , x c ), Xj 6 {0, 1}, consist of a "target" vector t and "input" vectors Vj : j £ {1, . . . , c}, all 
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of which are in C c (C G N). Without loss of generality, we will always transform the program so 
that t = (1,0,. • • ,0). Each vj is labeled by Xj, where Xj * s either Xj or Xj depending on the specific 
function fjj (but not depending on the specific input x). The vectors Vj satisfy the condition that 
Jd(x) = 1 (i.e. true) if and only if there exists a linear combination Y2j a j v j = t such that aj = 
if Xj * s (i.e. false). We call A the matrix whose columns are the Vj 's of Pd '- A = (vi, ■ ■ ■ ,v c ). 
Any function that can be represented by such a span program is called a direct boolean function. 

Compared to Definition 2.1 in [10], we have the condition that each input xj corresponds to 
exactly one input vector - this "direct" correspondance gives the functions their name. As a result, 
for each direct boolean function there exists two special inputs, x° and x , such that x° causes all 
Xj to be 0, and x 1 causes all Xj to be 1. Note this means x° and x 1 differ at every bit, f(x°) = 0, 
/(x 1 ) = 1, and / is monotonic on every shortest path between the two inputs (that is, all paths of 
length c). 

Threshold functions with threshold h correspond to direct span programs where Xj = x j f° r an 
j G [n], and where for any set of h input vectors, but no set of h — 1 input vectors, there exists 
a linear combination that equals t. It is not hard to show that such vectors exist. For threshold 
functions, x° = (0, . . . , 0) and x — (1, . . . , 1). 

Any span program can lead to a quantum algorithm. For details, see Section 4 and Appendix 
B in |10j . The general idea is that a span program for a function /, with an input x, gives an 
adjacency matrix for a graph gadget. When functions are composed, one can connect the gadgets 
to form larger graphs representing the composed functions. These graphs have zero eigenvalue 
support on certain nodes only if f(x) = 1. By running phase estimation on the unitary operator for 
a quantum walk on the graph, one can determine the value of the function with high probability. 

Determining the query complexity of this algorithm depends on the witness size of the span 
program: 

Definition 2.2. (Witness Size, based on Definition 3.6 in [10J ) Given a direct boolean function 
fr>, corresponding span program Pp, inputs x = (xi, . . . ,x c ), Xj G {0, 1}, and a vector S of costs, 
S G [0, oo) c , S = (si, . . . , s c ), let ri G C c , i G {0, . . . , C — 1} be the rows of A and Xj correspond to 



the columns of A (as in Definition 2.1). Then the witness size is defined as follows: 



If fo{x) = 1, let w be a vector in C c with components Wj satisfying w^ro = 1, uJVj = for 
i > 1, and Wj = if Xj = 0. Then 

wsizeg(PD, x) = min^ Sj\wj\ 2 . (1) 



• If Id{x) = 0, let w be a vector that is a linear combination of r{, with the coefficient of r$=l, 
and with Wj = if Xj = 1 ■ Then 

wsizeg (Pp , x) = min^ Sj \wj\ 2 . (2) 

j 

Claim 2.1. This definition is equivalent to the definition of witness size given in Definition 3.6 in 
jllOlj . The reader can verify that we 've replaced the dependence of the witness size on A with a more 
explicit dependence on the rows and columns of A. For the case of f(x) = we use what they call 
A^w as the witness instead ofw. 

Notation: If S = (1, ... , 1), we leave off the subscript S and write wsize(Po, x). 
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Now we will introduce a quantity called the subformula complexity. To simplify this paper, 
we will not go into the precise definition, which is not important for us, but rather focus on the 
relations between this quantity, the witness size of a function, and the query complexity of that 
function. (If you would like to know more about the subformula complexity, see Definitions 3.1 and 
3.2 in [10]. The following is adapted from Section 3 and 4.3 in [TO"]). 

Suppose we have a complete boolean evaluation tree. We choose \E\ such that 1/\E\ is larger 
than the query complexity of the full tree. (Note, that since the query complexity is normally 
somewhat large, \E\ is a fairly small quantity.) We choose a definite value for \E\ later, after 
partially calculating the query complexity. Consider a subtree within the larger tree that evaluates 
the function h(x), where x are literal inputs, (i.e. the inputs are not the outputs of other functions). 
If h has span program P^, then the subformula complexity z of h(x) is related to the witness size 
by 

z < ci + wsize(P fe ,x)(l + c 2 \E\), (3) 

where c\ and C2 are constants. The function h is rooted at some node v. We will often call the 
subformula complexity of h the complexity of the node v. If v is a leaf (i.e. a literal input), it has 
subformula complexity z = 1. 

Now we want to consider the subformula complexity of a composed formula: / = 
g(h(x), . . . , h(x c )). Let P g be a span program for g. Let Zj be the subformula complexity of 
h(x J ), and Z = (z\, . . . , z c ). Then the subformula complexity of / is bounded by 

z < ci + wsize z(P g , x) (I + C2\E\ max Zj) 

j 

< ci + wsize(P g , x) maxzj(l + c%\E\ maxzj). (4) 

3 3 

Using this iterative formula, we can upper bound the subformula complexity at any node in our 
formula. We now choose E such that E <C z v for all v, where z v is the subformula complexity at 
node v. Then there exists (Section 4.4 from |10j) a quantum algorithm to evaluate the entire tree 
using 0(1/E) queries to the phase- flip input oracle 

O a :\b,i)^(-l) b ^\i), (5) 

where cij is the value assigned to the i th leaf of the tree |10| . 



2.2 fc-fault Trees 

In this section, we will create a span program-based quantum algorithm for a fc-fault tree composed 
of a single direct boolean function with span program Pjj, which requires 0(n 2 u k ) queries, where 
lo = max£Wsize(P.D, x). First, we will be more precise in our definitions of trivial, fault, strong, and 
weak. Suppose we have a tree T composed of the direct boolean function fjj, represented by the 
span program Pp. Then, 

Definition 2.3. (Trivial and Fault) A node in T is trivial if it has input x where wsize(Po, x) = 1. 
A node in T is a fault if it has input x where wsize(Po, x) > 1. 

In calculating the query complexity of a boolean evaluation tree, we multiply the witness sizes of 
the individual functions. Thus, to first order, any node with wsize = 1 doesn't contribute to the 



query complexity, and so is trivial. We will show later (Thm. 2.5) that for direct boolean functions 



we can always create a span program that is trivial for inputs x° and x 1 . 

Definition 2.4. (Strong and Weak) Let a gate in T have inputs ), and input labels 

(xi, • • • , Xc)- Then the j th input is strong if /d(x) = Xj> an d weak otherwise. 
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It should be clear that this definition of strong and weak agrees with the one for threshold 
functions given in the introduction. 

Claim 2.2. The costs Sj corresponding to weak inputs do not affect the witness size. 



Proof. From the definition of witness size (Definition 2.2 ), for fz)(x) equals or 1, we require Wj = 
for all j where Xj 7^ fo{ x )- This means Sj is multiplied by for all such j, and therefore does not 
effect the witness size. □ 



Now we can restate and prove Theorem |1.2| 

Theorem 1.2. (Main Result) Given a k-fault depth n tree with each node evaluating a fixed direct 
boolean function frj, we can create an algorithm based on a span program Pd corresponding to fr> 
that evaluates the root of the tree with O(n 2 oj k ) queries, for some constant u: which depends only 
on f D . 

Proof. The basic idea is that to the first approximation, the query complexity can be calculated 
by considering one path from root to leaf, taking the product of all of the wsize(Po,x) that that 
path hits, and then taking the maximum over all paths. (This can be seen from the second line of 
Eq. [4|. The condition on the maximum number of faults along each path then gives a bound on 
the complexity throughout the tree, corresponding to the factor of uj k in the query complexity. It 
remains to take into consideration corrections coming from c\ and C2 in Eq. [4j 
With our insight into strong and weak inputs, we can rewrite Eq. [4] as 

z < c\ + wsize(Po, ir ) (maxstrong jZj)(l + C2\E\maxjZj). (6) 

Note this applies to every node in the tree. Let the maximum energy \E\ equal cn~ 2 Lu~ k , where c is 
a constant to be determined. We will show that with this value, the term C2\E\maxjZj will always 
be small, which allows us to explicitly calculate the query complexity. 
We will prove by induction that 

z < c'(noj K )(l + c 2 cc'/n) n (7) 

for each subtree rooted at a node at height n, where from Def. |1.1| k is an integer assigned to each 
node based on the values of k at its child nodes and whether those nodes are strong or weak. Here 
c' is a constant larger than 1, dependent on ci, and c is chosen such that C2cc' <C 1 and c'c -C 1. 

For leaves, z = 1 < d , so the above inequality holds in the base case. Consider a node with 
K = rj and height n. Then all input nodes have height n — 1. Weak inputs have n < k (notice 
the weak input subformula complexities only show up in the last term m&XjZj). If the node is 
trivial then strong inputs have k < 77, and if the node is a fault then strong inputs have k < 77 — 1. 
Assuming the appropriate values of Zj based on our inductive assumptions, then for the case of a 
trivial node, Eq. [6] gives 

z < ci + c(n - 1)^(1 + c 2 cc'/n) n - 1 (l + c 2 cc /n) 
< cW(l + c 2 cc'/nf. (8) 

Here we see that d is chosen large enough to be able to subsume the c\ term into the second term. 
For fault nodes, the bound on the complexity of inputs has an extra factor of uj~ 1 compared with 
the trivial case in Eq. [8j which cancels the extra factor of uj from wsize, so the induction step holds 
in that case as well. 

For a fc-fault, depth n tree, we obtain zo < d (nuj k )(\ + C2Cc' /n) n for all nodes in the tree. Notice 
since \E\ = cn~ 2 u~ k , \E\ <C Z for all nodes. Based on the discussion following Eq. kl this means 
that the number of queries required by the algorithm is of order which is 0(rr\j k ). □ 



7 



2.3 Direct Boolean Functions 



The reader might have noticed that our quantum algorithm does not depend on the boolean function 
being a direct boolean function, and in fact, the algorithm applies to any boolean function since 
any boolean function can be represented by a span program for which at least one of the inputs is 
trivial. However, if the span program is trivial for only one input, then it it is impossible to limit 
the number of faults in the tree. Thus to fully realize the power of this promise, we must have a 
boolean function / with a span program that is trivial on at least two inputs, x and y such that 
f{x) = and f(y) = 1. This requirement is also necessary to make the problem hard classically. 
In this section we will now show that direct boolean functions satisfy this condition. 

Theorem 2.5. Let fjj be a direct boolean function, and let Pp be a span program representing fp. 
Then we can create a new span program P' D that also represents fo, but which is trivial on the 
inputs x° and x . This gives us two trivial nodes, one with output 1, and the other with output 0. 



Proof. If Pjj is our original span program with rows rj as in Definition 2.2 then we make a new 
span program P' D by changing ro to be orthogonal to all other r^'s for i > 1. To do this, we let 
Ri be the subspace spanned by the rows r^, i > 1. Let Il^ 1 be the projector onto that subspace. 
Then we take ro — > (I — II/j 1 )ro. Now ro satisfies rjro = for i > 0. Looking at how we choose w 



in Def. 2.2, this transformation does not affect our choice of w. (For fo{x) = 1, w has zero inner 
product with elements of the subspace spanned by the rows r,, i > 1, so taking those parts out of 
ro preserves w^rQ = 1, which is the main constraint on w. For /d(^) = 0, w is a sum of with ro 
having coefficient 1. But the part of ro that is not in the subspace Ri will still have coefficient 1, 
and we are free to choose the coefficients of the rj (i > 0) terms to make up the rest of w so that it 
is the same as before.) Hence there is no effect on the witness size or the function represented by 
the span program. 

We can now divide the vector space R of dimension c into three orthogonal subspaces: 

R = ( ro ) e R x e R 2 , (9) 

where (ro) is the subspace spanned by ro, Ri is the subspace spanned by r^ for i > 0, and R2 is 
the remaining part of R. We can also write part of the conditions for w in Claim |2.2| in terms of 
these subspaces: if /d(cc) = 1, then w G (ro) © R2, and if /z?(5f) = then w G (ro) © R\. 

In both of these cases, the remaining w with the minimum possible length is proportional to 
ro, with length |ro| and 1 / 1 To I - When \j = f° r an 3: or Xj = 1 f° r an i> i- e - f° r inputs x° and x l , 
there are no further restrictions on w, so these are the actual witness sizes. This shows one of the 
witness sizes for trivial inputs must be at least 1. Seting |ro| = 1 by multiplying ro by a scalar, we 
can obtain both witness sizes equal to 1. This gives a span program for fjj such that for x° and 
x , wsize(PD)^) = 1- D 

We have shown that we can make a span program for fn with inputs x° and x 1 trivial. However, 
the final scaling step that sets |ro| = 1 may increase the witness size for other inputs (faults) 
compared to the original span program. Since we are limiting the number of faults, this doesn't 
hurt our query complexity. 



3 Classical Lower Bound Sketch 

The classical lower bound is proven by induction. We here give an overview of the proof for the 
NAND function. A more formal version (suited for a wider class of functions) with full details is 
in Appendix [Bj 
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Consider a boolean evaluation tree composed of NAND functions, of depth n, with no faults. 
The root and leaves all have the same value if and only if n is even. Now, suppose that every node 
at height i is faulty, and there are no other faults in the tree. In this case, all nodes at height i 
have value 1 and the root has value 1 if and only if n — i is even. Since n is known, to determine 
the value of the root, an algorithm has to determine the parity of i. 

Let T\ be the distribution on trees of depth n, which first picks i, the height of the fault (called 
also the split of the tree), uniformly at randonj^J and then picks at random, for each node at height 
i, which one of its children has the value 0, and which one has the value 1. Consider two leaves u, v 
which the algorithm queries. If their common ancestor has height less than i, u and v will have the 
same value. If the height is more than i + 1, the probability that they have the same value is 0.5, 
and if the height is exactly i, then the values will differ. Thus, by querying different leaves, the 
algorithm performs a (noisy) binary search, which enables it to determine the parity of i. 

It is easy to show that determining the parity of i is not much easier than determining i. In fact, 
we show that for every algorithm which gets a tree from the above distribution, if the algorithm 
queries at most j3 = logn/5 leaves, the probability that it can make a guess which has an advantage 
better than n _1//5 over a random guess is at most n~ 1//5 . Note that this type of bi-criteria guarantee 
is necessary: an algorithm could (for example) ask the leftmost leaf, and then (3 leaves which have 
common ancestors at height 1, 2, logn/5 from that leaf. If the algorithm uses this type of 
strategy, it has a small chance to know exactly in which height the fault (the split) occurred: it 
will find it in all trees in which it occurs in the first f3 levels. On the other hand, the algorithm 
could perform a regular binary search, and narrow down the range of possible heights. This will 
give the algorithm a small advantage over a random guess, regardless of the value of i. 

Since we have algorithms with a super-polynomial number of queries, we cannot apply some 
union bound to say that rare events (in which a dumb algorithm makes a lucky guess and learns 
a value with certainty) never happen. Thus, we must follow rare events carefully throughout the 
proof. 

The distribution we use for trees with multiple faults is recursive. We define a distribution Tjt 
on trees of height nk which obey the k fault rula^] We begin by sampling T\ (remember T\ is the 
distribution of the trees of height n which obey the 1-fault rule, and which have faults exactly at 
every node at height i). Then, we replace each one of the leaves of this depth n tree with a tree of 
depth n(k — 1), which is sampled from the distribution where we require that the sampled 

tree has the correct value at the root (the value at the root of the tree of depth n{k — 1) needs to be 
identical to the value of the corresponding leaf of the tree of depth n) . Note that the tree generated 
this way is identical to the one generated if we took a depth n(k — 1) tree generated recursively 
(sampling Ifc_i), and expanded each one of its leaves to a tree of depth n (sampling T\). However, 
it is easier to understand the recursive proof if we think of it as a tree of depth n, in which each 
leaf is expanded to a tree of depth n(k — 1). 

We now present the inductive claim, which is also a bi-criteria. Since we will apply the inductive 
assumption to subtrees, we need to consider the possibility that we already have some a priori 
knowledge about the value at the root of the tree we are currently considering. Moreover this prior 
can change with time. Suppose that at a given time, we have prior information which says the 

J For other functions the basic construction still chooses a split, but the split no longer consists of one level of 
faults. Instead there is a constant number of faults in each split. This does not change the asymptotic behavior. The 
appendix uses the more general construction, which does not map to this one for NAND functions (for example, if 
one were to apply the general construction to the NAND function, the split could involve two levels). 

4 As n is much larger than k, the assumption that the height of the tree is nk and not n does not alter the complexity 
significantly - one can just define m = n/k, and the complexity will be m °( lo s lo s m ) which is also n ( lo s lo s n ) f or 
k — O(logn). 
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value of the root of a tree is 1 with some probability p. We prove that if the algorithm has 
made less than /3 l queries on leaves of the tree, for I < k, then the probability that it has an 
advantage greater than c M < 0(2 fc - ,+1 n _3 ( fc - ,+1 )/ 5 ft+^( k ~ 1 )) over guessing that the root is 1 with 
probability p, is at most pk < 0(3n -1 / 5 ). Note that the probability that the algorithm guesses the 
root correctly varies as p varies. We require that with high probability during all that process that 
the advantage the algorithm gets by using the information from leaves below the root of the Tjt 
tree is bounded by c^. 

The proof requires us to study in which cases the algorithm has an advantage greater than 
Ckj when guessing a node, although it asked less than f3 l questions. We say that an "exception" 
happened at a vertex v, if one of the following happened: 

I There was an exception in at least two depth n descendants. 

II "The binary search succeeds:" We interpret the queries made by the algorithm as a binary 
search on the value of the root, and it succeeds early. 

Ill There was one exception at a depth n descendant, when there was some strong prior (far from 
50/50) on the value of this child, and the value discovered is different than the value which 
was more probable. 

Every type III Exception is also a type II, and a type I exception requries at least two type I, 
II, or III exceptions in subtrees. We show that if the algorithm has an advantage of more than Ck,i 
at a certain node with less than f3 l queries, then an exception must have occured at that node. 

The proof is by induction. First we show that the probability of an expection at a node of height 
nk is bounded by p^. The second part bounds c^, by conditioning on the (probable) event that 
there was no exception at height nk. Applying the inductive claim with k equal to the maximum 
number of faults in the original tree and no a priori knowledge of the root gives the theorem. 

The first part of the proof is to bound the probability of exception, pf;. For type II exceptions 
this bound comes from the probability of getting lucky in a binary search, and does not depend 
on the size of the tree. Since type I exceptions depend on exceptions on subtrees, intuitively we 
can argue that since the probability of exception on each subtree is small from the k — 1 case, 
the probability for a type I is also small. A technical problem in making this rigorous is that the 
algorithm has time to evaluate a few leaves in a large number of subtrees, then choose the ones that 
are more likely to give exceptions to evaluate further. Then the inductive bound from the k — 1 
case no longer applies to the subtrees with the most evaluations. The solution used in the paper 
is to simulate A, the original algorithm, by another algorithm B, that combines the evaluations by 
A into at most j3 subtrees, and then argue that the probability of exception in B cannot be much 
lower than that in A. This way, the inductive bound on Pk-i can be applied directly. 

For type III exceptions, we note that at each step where a type III exception is possible, the 
probability of exception at the subtree is cut by a constant factor C. Intuitively, this means it is 
C times less likely to have a type III exception. Another simulation is used to formulate this proof 
rigorously. 

For the second part, we consider a tree of height n from our distribution rooted at a node u 
(which need not be the root of the entire tree) , where each one of its leaves is a root of a tree of 
height n(k — 1). Let L denote the set of nodes at depth n, which are each roots of trees of height 
n(k — 1). The algorithm is allowed f3 l queries, and thus there are at most (3 nodes in L which 
are the ancestors of queries, at most /3 2 nodes which are the ancestors of f3 l ~ 2 queries etc. 
Suppose on each node v£l which received between and ft queries, we know the information 
Ck-ij- Suppose also, that one of the nodes in depth L is known, although it was queried an 
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insufficient number of times (since we condition on the fact that there is no exception at height 
nk, this assumption is valid). Adding the information we could obtain from all of these possible 
queries gives the bound on Ck,i- 

4 Conclusions 

We have shown a restriction on the inputs to a large class of boolean formulas results in a super- 
polynomial quantum speed-up. We did this by examining existing quantum algorithms to find 
appropriate restrictions on the inputs. Perhaps there are other promises (besides the A;- faults 
promise) for other classes of functions (besides direct boolean functions) that could be found in 
this way. Since we are interested in understanding which restrictions lead to super-polynomial 
speed-ups, we hope to determine if our restricted set of inputs is the largest possible set that gives 
a super-polynomial speed-up, or whether the set of allowed inputs could be expanded. 

The algorithm given here is based on the fact that there is a relatively large gap around zero in 
the spectrum of the adjacency matrix of the graph formed by span programs. That is, the smallest 
magnitude of a nonzero eigenvalue is polynomial in n, or logarithmic in the size of the graph. From 
the analysis of span programs of boolean functions we now have a rather good idea about what 
kind of tree-like graphs have this property. Furthermore Reichardt has found a way to collapse 
such graphs into few level graphs that have this same spectrum property [11], giving some examples 
with large cycles. One problem is to determine more generally which graphs have this property 
about their spectrum. 
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A Intuition from the NAND-Tree 

For those unfamiliar with span programs, which form the basis of our general algorithm, we will 
explain the key ideas in the setting of NAND trees from Farhi et al. |6]. In their algorithm a 
continuous time quantum walk is performed along a graph, where amplitudes on the nodes of the 
graph evolve according to the Hamiltonian 

H\n)= J2 "I 71 *)' ( 10 ) 

ri;: neighbors of n 

The NAND function is encoded in a Y-shaped graph gadget, with amplitudes of the quantum state 
at each node labelled by u\, U2, w and z, as shown in Figure [T] 

For an eigenstate of the graph with eigenvalue E > 0, associate to each node the ratio of the 
amplitude at the parent of that node to the amplitude at the node itself. For example, in Fig. [TJ 
the ratios associated with the top-most nodes are called input ratios, and have values yi(E) = ^ 
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z • 

Figure 1: Graph gadget for the NAND function, with amplitudes labelled by u\, U2, w and z. 



and 112(E) = —. Then using the Hamiltonian in Eq. 10, we can find the output ratio (associated 



with the central node) yo(E) = ~ in terms of the input ratios: 

y (E) = 1 . (11) 

yi(E) + y2(E) + E 

Farhi et al. associated the ratio at a node with the literal value at that node. They showed that 
for eigenstates with small enough E, the literal value 1 at a node corresponds to a ratio at most 
linear in E that is, < yi(E) < aiE for some ctj. The literal value of corresponds to a negative 
ratio, which has absoulte value at least 0(1/E), written yi(E) < —l/(biE) for some b%. Using 



the recursive relation (11), and the correspondance between literal values and ratios, we will show 
below that the Y gadget really does correspond to a NAND gate. 

We require E small enough so that Ea% and Eb% are small. We call 04 and bi complexities, and 
each input ratio or output ratio has a complexity associated with it. The maximum complexity 
seen anywhere in the tree determines the maximum allowed eigenvalue Eq, which in turn controls 
the runtime of the algorithm. We will determine the complexity at the output ratio of a Y gadget 
(the output complexity) , given the complexities of the input ratios (the input complexities) . 

• Input {00}. For simplicit y, w e assume the input complexities are equal, with yi(E) = 
y 2 (E) = =±. Applying Eq. [lTJ y (E) = bE/(2 - bE 2 ). To first order, y (E) = f . Thus, 
the output ratio has literal value 1, with complexity |. The output complexity is half of the 
input complexity. 

• Input {11}. For simplicity, we assume the input complexities are equal, with yi(E) = 
y 2 (E) = aE. So y (E) = -1/(E + 2aE). To first order, y(E) = Thus, the output ratio 
has literal value 0, with complexity 2a. The output complexity is double the input complexity. 

• Input {10}. Then y x (E) = aE and y 2 (E) = =±. So y(E) = bE/(l - (1 + a)bE 2 ). To first 
order, y(E) = bE. Thus, the output ratio has literal value 1, with complexity b. The output 
complexity equals the input complexity of the 0- valued input. 

We will show how the terms introduced in Section [T] (fault, trivial, strong, and weak), apply to 
this example. Recall a node with input {00} or {11} is trivial, with both inputs strong. A node 
with input {01} or {10} is a fault, with the 0- valued input strong and the 1-valued input weak. 

Generally, weak inputs matter less when calculating k from Section [TJ and we see in this example 
that the input complexity of the weak input does not affect the output complexity (as long as it is 
not too large compared to the eigenvalue). Next consider composing NAND functions by associating 
the output ratio of one gadget with the input ratio of another gadget. When we only have trivial 
nodes, the complexity is doubled and then halved as we move between nodes with input {11} 
and those with input {00}, resulting in no overall multiplicative increase in the complexity. So 
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trivial nodes do not cause a multiplicative increase in the complexity and therefore do not cause a 
multiplicative increase in the runtime. 

Whenever there is a fault, its strong input must have come from a node with input {11}. This 
node doubles the complexity, and then the fault itself does not increase the complexity. Thus the 
presence of the fault guarantees an overall doubling of the complexity coming from the strong input. 

Based on this analysis, the maximum complexity of a fe-faults NAND tree is 2 fc , since k, as 
calculated in Section[TJ corresponds in our example to the maximum number of times the complexity 
is doubled over the course of the tree. In turn, the complexity corresponds to the runtime, so we 
expect a runtime of 0(2 k ). (That is, the constant w is 2 for the NAND tree.) This analysis 
has ignored a term that grows polynomially with the depth of the tree, coming from higher order 
corrections. This extra term gives us a runtime of 0(n 2 2 k ) for a depth n, A;-fault NAND tree. 



B Classical Lower Bound 

B.l Lower Bound Theorem for Direct Boolean Functions 

Consider a complete c-ary tree composed of direct boolean functions /d(xi, • • • , x c )- Each node d in 
the tree t is given a boolean value v(t,d) £ {0, 1}; for leaves this value is assigned, and for internal 
nodes d, it is the value of fn{v{t, d\) . . . , v(t, d c )) where (d\, . . . , d c ) are the child nodes of d. If 
clear from context, we will omit t and write v(d). 
We will prove the following theorem: 

Theorem B.l. Let fjj : {0, 1} C — > {0,1} be a direct boolean function that is not a constant 
function. Suppose there exist ko and no such that for each r £ {0, 1}, there is a distribution Q r of 
trees composed of fo with height no, root node r, and satisfying the ko -faults condition, such that a 
priori, all leaves are equally likely to be or 1. Then given n> no and k polynomial in logn, for 
a tree t composed of fp with height k ■ n and satisfying the (k ■ ko)-faults condition, no probabilistic 
classical algorithm can in general obtain v(t,g) with at least confidence 2/3 before evaluating f5 k 
leaves, where (3 = [logn/lOj, and h = n — no- (Note a ■ b denotes standard multiplication.) 

An example of Q r for the NAND tree can be seen in Figure [2] 





10100101 001 1 1100 



Figure 2: Examples of the distributions Qr for the NAND tree. Each Q r is composed by choosing 
one of the two trees in its distribution with uniform probability. In this example, ko = 1, no = 2. 

For example, if k = [lognj, then the query complexity is f2((logn) log ") = fi(n loglog "), which 



gives us a super-polynomial separation from the quantum algorithm. To prove Theorem B.l| we will 
construct a 'hard' distribution such that for a tree randomly drawn from this distribution, the best 
deterministic algorithm (tailored to this hard distribution) cannot succeed with high confidence. 
Then by Yao's minimax principle |15j . the best probabilistic algorithm can on average do no better. 
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B.2 Construction of a Hard Distribution 



Choose fu, no, ko according to Thm. B.l. Fix n such that n 3> no- For each k > 1 we construct a 
distribution Tk of trees with height k-n satisfying the (k-ko)-iault rule. For a tree chosen at random 
from Tfc we show no deterministic algorithm can obtain the value at the root with probability 2/3 



before evaluating (3 k of the leaves, implying Thm |B,1 



Claim B.l. Given a function fu and a span program adjusted so that x = {0, 1} make the function 

), for w = {0, 1}, be the literal inputs corresponding to x = {0, 1} respec- 
tively. Then given r £ {0, 1}, n > 1, there exists a unique tree t rn with height n, root node g, and 
v(t rtn ,g) = r, such that for any non-leaf node d in the tree, the values of the child nodes of d are 
(#il, . . . , xi c ) or (#oi, ■ • • , xoc). Moreover, to,n and ti >n differ at every node of the tree. 

Proof. Recall that Xj = f° r all J when x = 0, and Xj = 1 f° r all 3 when % = 1. Hence if 
XQj = x'p then xij = x'- because Xj is represented by j or j 

depending on the span program, and 
independent of the input. (That is, if Xj always is the negation of Xj, then if Xj = 1, Xj = 0) and 
if Xj = 0, Xj = 1-) So for a node d in t rin , if d = 1 it must have inputs (in, . . . , xi c ) = (x^, . . . , x' c ) 
and if d = 0, it must have inputs (xoi> • • • > ^Oc) = (#i, ■ • • , a;' c )- As soon as the value of the root is 
chosen, there is no further choic© svciilciblc in t rn ^ giving a unique tree. Because xoj = x\j, to,n and 
t\ : n differ at every node. □ 

We construct Tk by induction on k. First we construct the distribution 7i- Randomly choose 
a root value r £ {0, 1} and a category value i £ {1, . . . , n}. Then up to depth i insert the tree t r ^. 
At each node d at depth i, choose a tree according to the distribution G v (t4) to be rooted at d The 
trees in Q r have height no, so we are now at depth (i + no). At each node d at depth (z + no) attach 
the tree ty^^^h-i- Since all of the nodes in t T)n are trivial, any path from the root to the leaves 
contains at most ko faults, so t satisfies the /co-faults rule. We call i the level of the fault, since 
faults can only occur from depth i to depth i + uq. Let Ti )T ,i be a distribution of trees constructed 
as above, with root r and category i. Then 7i is a combination of the distributions Ti^.i such that 
each distribution 7Lr,i with r £ {0, 1}, i £ {1, . . . , n} is equally likely. 

For the induction step, we assume we have a distribution of trees Tk— i- Then choosing a tree 
from the distribution Tk is equivalent to choosing a tree t from 71 and then at each leaf a of t, 
attaching a tree from 7fc-i with root value v(t, a). We call Tk,r,i the distribution of trees in Tk with 
root r and category i in the first n levels of the tree. As in the 71 case, Tk is a combination of the 
distributions Tk, r ,i such that each distribution Tk, r ,i with r £ {0, 1}, i £ {1, . . . , h} is equally likely. 
We'll call Tk t r the distribution of trees in Tk with root r. 



Before we state the main lemma needed to prove Thm B.l we need a few definitions 



Definition B.2. Suppose we have evaluated some leaves on a tree drawn from the distribution Tk- 
Let d be a node in a tree, such that d is also the root of a tree t € Tj- Then p r0 ot(d,r) is the 
probability that a randomly chosen tree in Tj, r is not excluded as a subtree rooted at d, where the 
probability is determined solely by the evaluated leaves in t, that is, only from leaves descending 
from d. 

For example, if g is the root of the entire tree, then p r0 ot(g, f) is the probability that a random 
tree from the distribution Tk,r is not excluded, based on all known leaves. If on the other hand, we 
look at a node a in Tk at depth n, then p r oot{o,,r) is the probability that a tree from Tk-\,r is not 
excluded as the tree attached at a, where the decision is based only on evaluated leaves descending 
from a. 

Let g be the root node. Then our confidence in the value at the root of our tree depends on the 
difference between p r0 ot(g,Q) and 

Proot 
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Definition B.3. We define the confidence level as 



\Proot{g,0) -Prootjg, 1)| ^ 
Proot(g,0) +Proot{g, 1) ' 

So our confidence level is if we have no knowledge of the root based on values of leaves 
descending from g, and our confidence level is 1 if we are certain of the root's value based on these 
values. 

Our goal will be to bound the confidence level if less than a certain number of leaves have been 
evaluated. In calculating the confidence level, we use p roo t rather than the true probability that the 
root g = because we will be using a proof by induction. Thus we must think of g as being the root 
of a subtree that is an internal node in a larger tree. Evaluations on other parts of this larger tree 
could give us information about g. In fact, if evaluations on the subtree rooted at g are interwoven 
with evaluations on the rest of the tree, our outside information about g will change between each 
evaluation we make on the ^-subtree. We could even learn the value of g with near certainty from 
this outside information. This outside information could then be used to help us decide which 
leaves descending from g to evaluate. However, we will show that the outside information will not 
help us much to increase the confidence level considering only leaves descending from g. 

Definition B.4. The relation p < q (p asymptotically smaller than q), means p < q(l + Cj5 a n l ), 
for fixed constants a > 0, 7 < 0, and C growing polynomial in k. Similarly p > q means p > 
q{l-Cp a h~<). 



Since k is polynomial in logn and (3 = [logn/lOj by Theorem B.l, Cf3 a is polynomial in logra, 
and for n large, Cj3 a n i contributes only a small factor. 
Our main lemma is: 

Lemma B.l. For each 1 < I < k, suppose the algorithm made less than (3 l evaluations on a 
tree chosen from some weighted combination ofTk,o an d 7k,l> where the weighting may change due 
to outside knowledge gained between each evaluation. Then with probability at least 1 — p^, the 



confidence level as defined in Def. B.3 is less than Ck,i- The values of pk and Ck,i are: 

Pk < Zn- l/ * 



Chl 



< (8n ) fc - /+1 n- 3(fc - i+1)/5 /3 4+6(fc -°. (13) 



The part of the lemma with / = k and an equally weighted combination of Tkfl and 7fc,i implies 
ThmEU 

C Proof of Lemma 1 



We will prove Lemma B.l in this section. 



In general, the confidence level as defined in Def. B.3 is too broad to tackle directly. Instead, 



we will rewrite the confidence level in terms of another probability p ca t ("cat" is for category). 

Definition C.l. Let d be a node in a tree drawn from the distribution Tk, such that d is also the 
root of a tree t £ 7j. Then p ca t(d,i,r) is the probability that a randomly chosen tree in Tj )r ,i is not 
excluded as a subtree at d, where the probability is determined solely by the known values of leaves 
descending from d. If d is clear from context, then we will simply write p ca t(i,r) 
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For each node d as above, let S(d) = J2i, r Pcat(d,i,r) and D(d) = ]T\ \Pcat(d,i,0) -p ca t(d,i, 1)|. 
Again the parameter d may be omitted if it is clear from context. Suppose g is the root of the tree. 
Then the confidence level can be written as 

\Proot{g,0) -proot(g,l)\ 



Proot{g,0) +Proot(g,l) 

_ = I Y.iPcat{g,i,0) - T,iPcat(g,i,i) 

Y,i, r Pcat(g,i,r) 
Ei \Pcat(i,0) -Pcat(i,l)\ 



< 



T,i,rPcat(i,r) 

Dig) 



S(g) 



(14) 



Our general approach in proving Lemma B.l will be to find an upper bound on D(g) and a lower 
bound on S(g), thus bounding D{g) / S{g) and hence the confidence level. We will generally refer 
to D(g) as D and S(g) as S. 



C.l Basis Case: k = 1 

The lower bound on S comes from creating an analog with a binary search. The following lemma 
will be used in the base case, and in the induction step: 

Lemma C.l. We have an interval of length Aq to be divided into smaller intervals. At each step 
t £ Z + , we choose a number p T between and 1. If our current interval is A T , then A T+ \ = p T A T 
with probability p T , and A T+ \ = (1— p T )A T with probability l—p T . Then form E Z + andO < F < 1, 
the probability that A m < FAq is less than 2 m F, regardless of the choices of p T . 

Proof. We model the process as a binary search with uniform prior. We try to guess a real number 
x, < x < Aq where at each step we guess the value y T and are told whether x < y T or x > y T . 
Then A T is the size of the interval not excluded after r steps, and p T is related to the position of y T 
within the remaining interval. For each strategy of choosing a sequence of p T 's, r G {0, . . . , m — 1}, 
where each p T can depend on all previous outcomes and choices of pf , for f < r, there are at 
most 2 m possible outcome subintervals A m . These subintervals fill the interval Aq completely and 
without overlap. Then the condition A m < FAq corresponds to x falling in an interval with size 
less than FAq. Since there are only 2 m such intervals, the combined length of intervals with size 
less that FAq is less than 2 m FAQ. Since we started with an interval of length Aq, the probability 
of being in one of these intervals is less than 2 m F. □ 

Consider a tree t chosen according to the distribution 71- We'll call the root node g (so through- 
out this section, whenever we write p ca t(i, r) we mean p ca t(g, i, r)). Suppose we have so far evaluated 
r — 1 leaves, and thus excluded a portion of the possible trees in T\. We are about to evaluate the 
r th leaf, which we'll call b. Let S T be the current value of S, and let SV+i be the value of S after b 
is measured. 

Let p T be the true probability that the leaf b is 0, taking into account outside knowledge of the 
value of g. Let q T be the probability that 6 = considering only leaves descending from g. Then 
with probability p T , b will be measured to be 0, in which case S T +i = q T Sr. With probability 1 — p T 
b will be measured to be 1, in which case <SV+i = (1 — q T )Sr. Notice that S T behaves similarly 
to A T , with q T in the recursive relation where p T is expected. To obtain a lower bound on S, we 
will show that at each step r, S T cannot decrease much more than A T . In particular we will show 
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Pt/Qt < 2 for all r, which means at each step S T decreases by at most twice as much as A T . So 
after T steps, if At < F with some probability, then S < 2~ T F with the same probability. 

At step r we have some outside knowledge of v(t,g). Let p g {r) be the probability that g = r 
considering all available information (thus it reflects the true probability of obtaining g = r). 
Note p g (r) can change at each r as the outside information changes.) Let p^ r be the conditional 
probability that a tree not excluded from Ti, r bas v(b) = 0. pn r is calculated based only on leaves 
evaluated in t (i.e. using no outside information). Our confidence level is at most ci,i (or otherwise 
there is no need to evaluate further), so \p r0 at(g, t) — 1/2 1 < c^i/2, which is asymptotically small. 
Up to this small term, we can replace p r oot(g, r ) with 1/2. Then 

Pt _ Pg{ti)Pb\Q+PgO)Pb\l 



Qr Proot{g, 0)Pb\0 + Proot(g, l)Pb\l 

_ Pg(°)Pb\o +Pg(l)P6|i 

(Pb\0 +Pb\l)/ 2 

< 2. (15) 

So at each step r, we divide S by at most an extra factor of 2 in relation to A T . We emphasize 
that here we assumed nothing about the value of p g (0). Indeed it will become crucial later that for 
the purpose of bounding the decrease in S, the values p g (0) can be anything. 



Now we apply Lemma C.l at the beginning of the computation, we have p C at{h 0) = Pcat(h 1) = 
1 for all 1 < i < n, and so S = In. Thus we consider Aq = 2h, m = f3 = |_(logn)/10j and 
F = n -3 / 10 . Then A m < 2n 7//10 with probability less than 2 m F = n -1 / 5 . From our extra factor 
of 2 above, over the /3 steps, we see that S is smaller than A m by at most a factor of n 1 / 10 , so 
5 < 2n 3//5 with probability less than n -1 / 5 . In other words, S > 2n 3//5 with probability greater 
than 1 — n -1 / 5 . We call this procedure the division process. If A m < 2n 7//10 (which will happen 
with small probability), then we say the division process was lucky. 

For the bound on D, we will look at how the p ca t are updated after each evaluation of a leaf. 
Immediately after the first leaf is evaluated, all p ca t become 1/2 because for each category i and 
root value r, any leaf initially has an equal probability of being or 1. Now suppose a leaf b is 
evaluated at some later point and E is the set of leaves already evaluated. We have c™ leaves, and 
we can index each leaf using the labels {1, . . . , c} n , so for example, the labels of two leaves sharing 
the same parent node differ only in the last coordinate. Let h be the maximum number of initial 
coordinates that agree betwee b and any element of E. Thus h is the depth of the last node where 
the path from the root to b breaks off from the paths from the root to already evaluated nodes. 
Let {ej} 6 E be the leaves in E that share h initial coordinates with b. We will now update each 
Pcat{i, r), depending on the outcome of the evaluation of b, and the position of h relative to i. 

There are three cases: 

• [i + no < h] Here the faults occur at a greater height than the split at h. Then b and {ej} 
are both contained in a subtree of the form t r j, so in such a tree, knowledge of a single node 
determines all other nodes. Thus b can be predicted from the values of {ej}. If b agrees with 
the predicted values, then p ca t{i, r) remains the same. Otherwise Pcat(h r ) = 0) since we learn 
that the faults must in fact come after the split at h. 

• [i > h] Now b is the first leaf we evaluate that descends from some node d at level i. b has 
equal probability of being or 1 (even if we know the value of d), so Pcat(h r ) is reduced by 
half. 
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• [i < h < i + no] Without knowing the details of fo , we cannot know what this part of the 
tree looks like. However, the increase in \pcat(h 0) — p ca t(i,l)\ is at most 1 for each i. Since 
there are no i's that satisfy the condition i < h < i + tiq, the total increase in D is at most 
n . 

We see the only case that involves an increase in D is the third, with an increase of at most no 
at each evaluation. Since D is at the beginning, and we have at most j3 evaluations, D < /3no- 
Combining this with our bound on S, we have 

,D n /3 
ci i <^< 



S ~ 2n 3 /5 

Pi,i<n- 1/5 . (16) 



This is more stringent than what is required for Lemma B.l so we have completed the basis case 



C.2 Induction Step for Proof of Lemma B.l 
C.2.1 Exceptions 

In the k = 1 case, the bound on p^ comes from the probability of getting lucky during the division 
process. In the k > 1 case, there are several different ways to get "lucky." We will call such 
occurrences exceptions. We will first show the the probability of getting an exception is low. Then, 
in the final two sections, we will show that as long as an exception does not occur, the confidence 



level of the classical algorithm will be bounded according to Lemma B.l 

Consider a tree in Tfc, with root g and nodes a G L at depth n. Then the nodes in L are the 
leaves of a tree in 71, and the roots of trees in 7&_i by our construction of Tk- We call the tree in 
Tk the main tree and the trees rooted in L subtrees. We put no restrictions on the order that the 
classical algorithm evaluates leaves, but to prove the lower bound, we will think of the algorithm 
as trying to find the values of nodes in L, and from there, performing the division process on the 
71 tree as described in the k = 1 case. 

With this set up, we can describe the three types of exceptions. Type I occurs when exceptions 
occur at two or more subtrees. Type II occurs when the division process in the main tree is lucky, 
i.e. when A T > 2n 7 / 10 for some r < m. This can only happen after the division process has started 
(that is, after obtaining the second sure value of a node at depth n. Assuming there are no type I 
exceptions, this must take at least (3 k ~ l evaluations). After the start of the division process, this 
occurs with probability at most n -1 / 5 . Type III occurs when an exception occurs in one subtree, 



but with special conditions, which will be explained below. It will be shown in Section C.2. 3 that 
S cannot be too small if none of the above has happened. 

C.2. 2 Bound on pk 

In this section we will bound the probability of learning the value at the root of the tree when an 
exception occurs. 

First we will define Type III exceptions. 

Definition C.2. Let b be a leaf descending from a G L, with g the root of the tree. If v(b) = r' 
we know we will have an exception at the subtree rooted at a, while if v(b) = ¥ then there is no 
exception. Let p par tial(a) be the probability that v(b) = r' considering only information from other 
evaluated leaves descending from a. Let p pa rtiai(g) be the probability that v(b) = r' considering only 
information from other evaluated leaves descending from g (i.e. no outside knowledge). Then a 
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Type III exception occurs if b is measured to be r' , and if p par tial(g) 1 1 PpartialU) < 2/3 3 . We call 

P P artial(g)/Ppartial(a) the bias ratio. 



We will see in Section C,2,3| why Type III exceptions harm the confidence level. 



Again, the three types of exceptions are: 

• Type I are when exceptions occur at two subtrees rooted in L. 

• Type II are when the division process is lucky. 



• Type III is described in Def. C.2 and is when an exception occurs with a bias ratio less than 

2/r 3 . 

We will only consider the case I = k, since if I < k nodes are evaluated, the probability of having 
an exception cannot be more than if I = k. In proving this, we will use a generalization of T/%, that 
is more flexible to use in an induction. 

Definition C.3. An element of the set 7k is a collection of c n c-ary trees from Tk-i- Each tree 
in the collection is called a subtree. (We call this a subtree because it corresponds roughly to a 
subtree in Tk, but notice that subtrees in Tk are not connected to each other as a part of a larger tree 
structure as in Tk- In both Tk and Tk, a subsubtree is a tree rooted at a depth n below the root of 
a subtree.) All roots of subtrees in Tk have a priori probabilities that are arbitrary, time dependent 
(the priors can change after each evaluation of a leaf on any subtree), and unrelated to each other. 
For convenience, we call the whole collection of subtrees in Tk a tree. Then a Type I exception in 
such a tree is defined to be two exceptions at two different subtrees, where exceptions at subtrees 
are normal exceptions in Tk—i- Let p pa rtial(a)i r ) be the probability that a leaf b descending from a 
root a of a subtree in Tk has value r based only on other evaluated leaves descending from a. Let 
Ppartiai(g){ r ) be the probability that a leafb descending from a root node a has value r including the 
a priori probabilities associated with a. Then a Type III exception is defined to be an exception at 
a subtree With P P artial(g)( r )/Ppartial(a)( r ) < 2 ^ 3 ■ We cal1 Ppartial(g)(.r) /p p artial(a){ r ) the hias ratio 
for trees in Tk- Exceptions in subtrees are exceptions for Tk-i as defined above. As in a normal 
T-i tree, if an algorithm makes [3 k ~ l or more queries at a subtree of Tk, the value of the root of 
the subtree is given, and exceptions can no longer occur at that subtree. 

Note that in the structure Tk, binary values are associated to each node of the subtrees Tk-i, 
but nowhere else. In particular there is no such thing as the value of the root of Tk- Note also that 
there are no type II exceptions for Tk- The numberings for exceptions are used in analogy with 
exceptions for Tk- 

We may consider Tk as a black box that accepts as input the location of the next leaf to query, 
and outputs the value of that leaf according to the priors, as well as a new set of priors for all 
subtrees. At some time, suppose that the priors that a certain subtree's root has value is p 
and has value 1 is 1 — p. Further, suppose that based on previous outputs of the black box, the 
distribution at that subtree of 0- valued subtrees is and the distribtion of 1- valued subtrees 

is Til i- Then the value returned by the black box must be chosen according to the distribution 
P'L'k-x o + (1 ~P)Tk-i l • T° P rove that an algorithm on Tk has a low probability of exception, we will 
need to prove upper bounds on the probability of exceptions for an algorithm running on Tk- For 
this purpose, we can assume that the algorithm on Tk is specifically trying to get such an exception, 
and that the priors can change in a way that is beneficial to the algorithm. 

Our strategy in proving the bound on the probability of exception in a tree from Tk is as follows: 
we will first show that we can map an algorithm A acting on a tree from Tk to an algorithm A 



20 



acting on a tree from Tk- The mapping is such that the probability of getting a Type I or Type III 
exception is the same in both algorithms. Next we will map A to an algorithm B, which acts on a 
smaller set of subtrees than A. We will show that the probability that B has exceptions is similar to 
the probability that A has exceptions. Using an inductive proof, we will then bound the probability 
of B having an exception, which in turn bounds the probability of A having an exception, which 
then bounds the probability of A getting a Type I or III exception. Type II exceptions are treated 
slightly differently throughout the analysis, using the fact that the probability of a Type II exception 
in a tree from Tk is always bounded by n _1//5 once the division process has started in that tree. 

Consider a mapping from a tree in Tk to a tree in Tk where any time an algorithm A evaluates 
a leaf on a subtree from a distribution Tk, we copy the value of the leaf to the corresponding leaf 
on the corresponding subtree from Tk- (Note Tk was created so that a tree in Tk has the same 
number of subtrees as a tree in Tk-) Set the a priori probabilities at a root a of a subtree in Tk to 
equal that of the corresponding node a at depth n in A (this guarantees that the values returned 
by Tk are in the right distribution). We call A the algorithm acting on Tk- Notice that anytime a 
Type I exception occurs in A, it also occurs in A since if there is an exception at a subtree in one, 
there is an identical subtree on the other that also has an exception. Also, anytime A has a Type 
III exception, algorithm A will also have a Type III exception, since the priors are the same. So 
we can conclude that the probability that A has a Type I or Type III exception is identical to the 
probability that A has Type I or Type III exceptions. 

Our strategy will be to bound the probability that a tree from Tk can have an exception of Type 
I or III, and then we will use the bound that a Type II exception happens with probability n -1 / 5 
once the division process starts (which it will for fi k evaluations), for the same reason we get lucky 
with probability n -1 / 5 in the k = 1 case. 

Let A be as defined above. To get the bounds on the probability of exception, we will simulate 
A using an algorithm B which acts on f3 trees of type Tk-i- Suppose A evaluates leaves descending 
from a subtree rooted at the node a^, and furthermore, evaluates leaves descending from subsubtrees 
at depth n below ai, rooted at the nodes (an, aa, ■ ■ ■ )■ Then B randomly picks one of its /3 trees 
on which to simulate ai, and each time A evaluates on a new subsubtree of ai rooted at aij, B 
randomly chooses a new subtree in the tree it has chosen to simulate ai, and one that isn't yet being 
used for simulation. Then all leaves evaluated on a subsubtree by A are mapped onto the leaves of 
the corresponding subtree in B. This setup will always be possible because the number of evaluated 
leaves is small compared to the number of nodes at depth n. The result is that each subtree in A 
is mapped (not necessarily injectively) to one of the /3 Tk-i trees, while the sub-subtrees in A are 
mapped injectively to subtrees of the appropriate Tk-i- 

Our strategy will be to show that B has Type I and Type III exceptions at least as often as A, 
and then to bound the probability that B has an exception. Notice that each of the /3 trees in B 
can have at most (3 k evaluations on it. So we will prove the following lemma: 

Lemma C.2. Suppose an algorithm makes less than f3 k+1 evaluations on a tree from Tk- Then 
the probability of getting one exception on a subtree is < (/3 3 + 3/3 2 )n -1 / 5 . The probability of 
getting a Type I exception is < (/3 3 + 3/3 2 ) 2 n -2 / 5 . The probability of getting a Type III exception is 
< (l + 3/3~ 1 )2n- 1 / 5 . 



Note that we are given (5 times more total evaluations than in Theorem B.l However, we will 
give the algorithm the added advantage that once f3 k ~ l queries are made on a tree from Tk-i, the 
algorithm will just return the value of the root of the subtree. This added information will only be 
an advantage. Thus, the number of evaluations on each subtree is bounded by /3 fc_1 . Essentially 
we are trying to show that the probability of exception also increases by at most a factor of (3. To 



prove Lemma C.2, we require one further lemma: 
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Lemma C.3. Suppose we have a finite sequence of real numbers {bi}, such that each bi < fi k ~ l and 
^2ibi < f3 k+1 . Construct a subset W of {bi} by choosing to place each bi into W with probability 
(3~ 2 . Then the probability that X^eVK > @ k * s °f or der 

Proof. First normalize by letting b\ = bi/ 'P k ~ 1 . Then we can describe the sum of elements in W as 
a random variable X , which is itself the sum of random variables Xi. The value of Xi is chosen 
to be if bi W and b\ if bi G W. Then the mean Xi is less than j3~ 2 and ^ Xi < 1. Since on 
average, at most one of the Aj's will be 1, W is very close to a Poisson distribution with parameter 
at most 1. Then the probability that X > (3 is at most of order 1//31. □ 



Now we will prove Lemma |C. 2 



Proof. The proof will follow from induction on k. For k = 1, there are no subtrees, so the probability 
of exceptions on subtrees is 0. 

For the inductive case, we assume we have an algorithm D acting on a tree from 7/j, such that 
T> can evaluate leaves descending from up to (3 k+1 subtrees. Notice that once T> evaluates more 
than leaves of any subtree, an exception can no longer occur on that subtree and in fact the 
algorithm is given the value at the root of the subtree, so we can stop simulating that subtree. It 
takes (3 k ~ 2 evaluations on any subtree to begin the division process in that subtree, so T> with its 
P k+l total evaluations has /3 3 opportunities to get an exception of Type II in a subtree. Thus the 
probability of getting an exception of Type II in one subtree is at most /3 3 n -1 / 5 . 

To determine the probability that T> triggers Type I and III exceptions in subtrees, we will 
simulate using a new algorithm T . T evaluates on (3 2 trees of type Tk-i- The simulation will work 
similarly as above, except T can evaluate on f3 2 trees instead of (3 trees. 

Now suppose V produces an exception of Type I or Type III on a subtree rooted at a. We 
show that this implies an exception in the corresponding tree in T . A type I exception on a means 
there are two exceptions on subtrees of a (of type Tk-2)- Since there is an injective correspondence 
on subtrees of type 7fc_2, there are exceptions on two subtrees of the Tk-i that a is assigned 
to. The argument for type III exceptions is similar. If less than f3 k evaluations are made on 
the corresponding tree in J 7 , then we can use the probabilities from the inductive assumption to 



bound the likelihood of exceptions occurring. Furthermore, using Lemma C.3 we will show that 
the probability that more than f3 k evaluations are made on any one of the (3 2 trees used in T is at 
most of order 1//3!. 



Let bi in Lemma C.3 be the number of evaluations T> has made on a subtree rooted at aj. Notice 
that all bi are less than /3 fc_1 , as required since T> makes no more than p k ~ 1 evaluations on any 
one node in L. W the n co rresponds to the amount of evaluations on one of the /3 2 trees that J- 



evaluates. By Lemma C.3 the probability that the number of evaluations on one of J-"'s trees is 
larger than (3 k is at most 1//3!. Since /3! is superpolynomial in n, we can ignore the probability of 
this occurring, even after multiplying by /3 2 for each of the possible trees used by T . 

A Type I or Type III except ion a t a subtree T> creates a corresponding exception at a root of one 



of J-"'s trees. Thanks to Lemma C.3, we can use the inductive assumptions, so the probability of one 
exception of Type I or III occurring on a tree in f is < (l+3/3~ 1 )2n~ 1 / 5 + (/3 3 +2/3 2 ) 2 n _2 < /5 < 3n -1 / 5 . 
Since there are (3 2 trees used by F, the total probability of T having 1 exception of Type I or III is 
at most 3n -1 / 5 /3 2 , and therefore this is also a bound on the probability of T> having an exception. 
Combining this with the probability of /3 3 n -1 / 5 of getting a Type II exception in a subtree in T>, 
we have the first bound in Lemma IC.21 

If P is the the probability that D has Type I or III exceptions on two of its subtrees (leading to 
a Type I exception), then T will also have exceptions on two of its trees with probability at least 
P(l — f3~ 2 ) since it will have two exceptions unless it happens to simulate both of the subtrees with 
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exceptions on the same tree, which occurs with probability /3 -2 . Using the inductive assumption, 
the probability of getting two Type I or III exceptions is the square of the probability of getting a 
single exception, times the number of ways that two trees can be chosen among /3 2 trees: 

p(l _ 0-2) < 4fr 2/5 l 2 (/^-l) _^ p < ^-2/5^ (17) 

Combining this probability with the probability of a Type II error, we obtain the second inequality 
in Lemma IC.21 



For the final inequality in Lemma C.2, regarding exceptions of Type III, we will simulate the 
evaluations that T> makes using another algorithm C that is also evaluating on a tree from Tk- 
The value of any leaf that V evaluates is mapped to the corresponding leaf in C. However, the a 
priori probabilities of root of subtrees in C are adjusted so that any time V is about to get a Type 
III exception, the bias ratio in C goes to 1. To get a bias ratio of one, simply make the a priori 
probabilities equal. We are free to change the a prior probabilities in this way and still have the 
tree be in Tk, and therefore, the inductive assumption, as well as the first two bounds in Lemma 



C.2 must hold for C since it holds for all trees in Tk- 

Note that if an exception at a subtree rooted at a is possible in T>, then it will also be possible in 
C. If an exception can happen in D, then conditioning on v(a) = or v(a) = 1, there is a non-zero 
probability that b, a leaf descending from a, will trigger an exception at a. So when we look at an 
even likelihood of v{a) = and v (a) = 1, there will always be some probability that b will trigger 
an exception at a in C. 



By Lemma C.2, C's probability of getting one exception at a is just (/3 3 + 3/3 2 )n -1 / 5 . Changing 
the bias ratio changes the probability that there is an exception at b. Let p par tiai(a) he the probability 
that there is an exception at a due to b, only considering information from leaves descending from a. 



Then using Eq. 15 and the definition of the bias ratio, in C the probability of getting an exception 
at a is less than 2p partiaVya y So 2p partial ^ a) < ((3 3 + S/? 2 )^ -1 / 5 . Now in V, by the same reasoning, 
the probability of getting an exception at a is less than &p pa rtial(a) I ' ''■ But we can plug in for 
2Ppartiai(a) to get the probability of exception is < (1 + 3/3 _1 )2n -1 ' 5 . □ 

Now we c an pu t this all together to get the bound for on pk- As described in the pargraphs 



following Def. C.3, we simulate the algorithm A using an algorithm B, where B makes evaluations 
on only /3 trees from Tk-i- First we will determine the probability of getting each type of exception 
on one subtree in A. Since it takes f3 k ~ 2 evaluations to begin the division process in a subtree, 
and we have f3 k total evaluations, A has at most /3 2 attempts to get one exception of Type II in a 
subtree, for a tot al pr obability of /3 2 n -1 / 5 



From Lemma 



C.2 



the /3 trees evaluated by B each have a probability of < (/3 3 + 2/3 2 ) 2 n -2 / 5 
of having an exception of Type I, and a probability of < (1 + 3/3 -1 )2n -1 / 5 for Type III, which 
we know is at least as large as the probability of A itself having a corresponding exception on a 
subtree. 

Combining all possibilities, we get that the probability of an exception of any type on one 
subtree in A is asymptotically smaller than 

^n-Vs + + 3/ 3"i)2n- 1 /5 + ^3 + 2/3 2 ) 2 n" 2 / 5 ) 

< ^n" 1 / 5 . (18) 

Finally, we can determine the asymptotic probability of getting Type I and III exceptions each 
in A: 
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Type I: Using the simulation, this is bounded by the square of the probability of getting a 
single exception, times the number of ways two subtrees can be chosen from B's /3 available 
trees, giving /3 6 rT 2 / 5 /3(/3 - l)/2. 



• Type III: By the same reasoning as at the end of Lemma C.2, the probability of getting an 



exception with bias ratio 2/3 3 is at most 2/3 3 /3 3 n x / 5 = 2h 1 / 5 . 

Now our original algorithm A acting on a tree in 7fc has a probability of Type I and III exceptions 
that is at most that of A, but it additionally has a n -1 / 5 probability of a Type II exception. 
Combining these three probabilities, we have 



Pk 



< n" 1 / 5 + 2n" 1 / 5 + /3 6 n" 2/5 /3(/3 - l)/2 < 3n" 1/5 . (19) 



C.2. 3 Bounding S 



Now that we've seen that the probability of an exception is low, we will bound the confidence level 
assuming no exceptions occur. As in the k = 1 case, we will put bounds on S and D rather than 



on the confidence level directly. We use an iterative proof, so we assume that Lemma B.l holds 
for Tk-i- In fact, to simplify the argument, if /3 fc_1 leaves descending from a node o£l have been 
evaluated, or if an exception occured at a, we will give the algorithm the correct value of a. The 
algorithm can do no worse with this information. 

By induction, we can apply the confidence bound Cfc_i j to nodes in L since they are the roots 
of trees from Tk-i- While this gives us information about Proot(o,,r), what we really care about is 
Proot(g, r). To calculate the latter, we define ptree{t) for t G 71 to be the probability that a randomly 
chosen tree from Tk with its first n levels equal to t is not excluded. Then 

Ptree(t) = Y\_Proot(a,v(t,a)), (20) 

and we let p C at{hf) be the weighted average of Ptree(t) for an t £ 7i,r,i- This weighting is due to 
uneven weightings of trees in Q T . Trees in T\ containing less likely trees from Q r should have less 
weight than those containing more likely ones. 

From ptree, we can obtain a bound on the change in Pcat{g,r) due to a change in a p root (a,r). 
Suppose due to an evaluation on a leaf, p r0 ot(a, 0) changes to p roo t(a,0)' . Then every ptree(t) 
where v(t,a) = will change by a factor of p TO ot{p>, 0)'/p rO ot(a, 0). Now each p cat is a weighted 
average of some trees with at a, and some with 1 at a. So at most, the multiplicative change in 
each p cat due to p roo t(a,0) is equal to p r0 ot(o,, 0)' /p r oot(a, 0). Then the change in logS 1 is at most 
\og(p roo t(a, 0)' / Proot(a, 0)), which is just the change in logp r0 ot(a, 0). We can go back and repeat the 
same logic with a = 1, to get that the total change in logS is at most the change in log p roo t(a, 0) 
plus the change in logp roo t(a, 1). This bound is only useful if a is not learned with certainty. 

Note that the confidence level does not change if we renormalize our probabilities because the 
confidence level is a ratio of probabilities. Since ptree(t) and p r0 ot(a, f) may be extremely small, we 
will renormalize p r oot(o:,r) to make the probabilities easier to work with: we renormalize so that 
Proot(a,0) + Proot(cL, 1) = 2. We will just need to be careful when we apply the results of the k = 1 
case, that we take the new normalization into account. 

We will now prove several lemmas we will use in bounding S. 

Lemma C.4. Suppose no exception has occurred at a G L, and less than ft 1 leaves descending 
from a have been evaluated, and p r0 ot(a,r)i is the value of p roo t(a,r). At some later time, such 
that still, no exception has occurred at a G L, and less than j5 l leaves descending from a have been 
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evaluated, p rc ,ot(a, r) / is the value ofp roo t(a, r). Then the change in logp roo t(a, r) = logp roo t{a, r)f— 
log p root (a,r)i obeys A log p root (a, r) < 2c fc _i^. 

Proof. By induction, if no exception occurs, and if less than j3 l leaves descending from a G L have 
been evaluated, then using our normalization condition we have 



\Proot(a,0) -Prootja, 1)| 

Proot{a,0) +p root (a,l) 



\Proot(a,r) - 1| < c k -i,i 



(21) 



for r G {0, }1. Then with p root (a, r) / and p roo t(a, r)i as defined above, p roo t(a, r)/ = A- p roo t(a, r);, 
where in order to satisfy Eq. |2~T| 



A G 



1 ~ Cfc-lj 1 + Cfc-lj 
1 + Cfc_l/ 1 - C fe _l^ 



I log A| < 2c fc _i,, 
Alogp roo4 (a,r) < 2c fc _ M . 



(22) 



□ 



We will use Lemma C.4 when we determine how S changes when less than leaves from a 
node have been evaluated. 

Type III exceptions are bad because they cause S to be divided by a much larger factor than 
A T during the division process on the main tree, where A T keeps track of the true probabilities 
during the division process, as in the k = 1 case. 

Lemma C.5. If there is an exception in a subtree, but it is not a Type III exception, the the 
probability that S changes and the amount that S changes differ by at most a factor of Af3 3 . 

Proof. We will prove the contrapositive. Without loss of generality, suppose v(b) = gives an 
exception at a, and if v(b) = 0, then we know v(a) = with high probability. Let q r be the 
probability that v(a) = r including all possible information. Let q' r be the probability that v{a) = r 
based only on evaluated leaves descending from g, so no outside information. Let the probability 
that v(b) = conditioned on v(a) = r be pi,\ r . 

Conditioning on v(b) = 0, the probability that S changes is equal to the probability that 
v(a) = including all possible information, which by Bayes Rule is qoPb\o / (loPb\o + QiPb\i)- Now 
the factor by which S changes is simply q' , since S only depends on nodes descending from g. To 
obtain a contradiction, we assume that these two terms differ by a lot. In particular, we assume 



We can rewrite this as 



, , QoPb\o + QiPb\i 
% , 

QoPb\o 



% 1 + 



glP6[l 

aoPb\o 



< 



4/3 3 



< 



4/3 3 



(23) 



(24) 



This tells us that (i) < 1/(4/3 ) . Furthermore, using Eq. 15, we know q' Q > qo/2. Using this, 



and the fact that go + q\ = 1, we can write Eq. 24 as: 



1 + (9iPb|i)/(?oP6|o) 



1 + 2! 

go 



< 



1 



(25) 
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From this equation, we have the restriction q\/qo > 2/3 3 (2/3 3 > 1 so we can ignore the 1 in the 
denominator), and (ii) p&h/p&io < 1 / (2/3 3 ) . 
Now let's consider p parUa i( g ) 1 1 Pp ar tial{a)' 



Ppartial(g) _ Pb\oQo + Pb\lQl ^ 
Ppartial(a) (Pb\0 + Pb\l)/^ 

From (ii), let p b \ /p b \i = 2(3 3 M 1 , where M 1 > 1. From (i), let q' Q = l/(4/3 3 M 2 ) where M 2 > 1, so 
then q[ = 1 - l/(4/3 3 M 2 ). Then 

Ppartial(g) = 2(3 3 M 1 /(A^ 3 M 2 ) + (1 - l/(4/? 3 M 2 )) 
Ppartial(a) 

Since /3 3 Mi > 1/2, we have 

Ppartial(g) _ 1 _L 1 _2_ /^ox 

P P artia/(a) 2f3 3 M 2 ^f3 3 M 1 Af3 3 M 1 /3 3 • 1 ; 

□ 

We can now prove the following lemma: 



Lemma C.6. For the case (k,l) in Lemma B.l, if an exception on the main tree does not occur, 
then if an algorithm evaluates less than (3 l leaves, S > n 3 / 5 /3~ 3 /2 if k = I and S > 2h otherwise. 

Proof. For the case I = k, we know with high probability at most /3 nodes in L from directly 
evaluating descendents of those nodes. (We learn ((3 — 1) nodes from evaluating (3 k ~ 1 leaves on 
each of (3 — 1 subtrees, and 1 from one possible exception on a subtree). Thus at most (3 times 
during the algorithm, a p r oot (a,r) is set to 0, and its pair is set to 2. When the /3( fc-1 )'th leaf 
descending from a node a is evaluated, this can be thought of as a query for a. So the division 
process proceeds identically as in the case k = 1, except for two differences. First, when the j th 
node in L is learned with certainty, we must multiply S by a normalizing factor Fj. This is due to 
renormalizing p rO ot(0; 0) +p r oot( a , 1) = 2 in the k > 1 case, whereas in the k = 1 case, when a node 
a at depth n is learned, p rO ot(a,0) + Proot(a, 1) = 1. (Note Fj > 1 for all j, and further, we don't 
actually care what the Fj are, since D will be multiplied by the same F'^s.) Second, by Lemma 



C.5[ the one possible exception in a subtree causes S to decrease by at most an extra factor of 4/3 3 
as compared to the k = 1 case. Combining these terms, we get that the decrease in log S, AlogS 1 , 

j^ogFj. 



is at most (2 log n)/5 + log 4/3 3 — V • log Fj 



Next we need to estimate the decrease in log S due to nodes in L where less than of their 
leaves have been evaluated. There are /3 times in the algorithm when we learn the value of a node 
in L. There are then (3 + 1 intervals, between, before, and after those times. We will determine 
the maximum amount that log S can change during a single interval, and then multiply by /3 + 1 
to account for the change during all intervals. We will be considering intervals because we will 



be using Lemma C.4, which only applies in periods in between when the values of nodes in L are 
learned. 

For a given interval, let M(a) be the number of leaves descending from a G L that have been 
evaluated. Then for 1 < j < I let Kj = {a G L : f3^ x < M(a) < f3 j }. Because less than (3 k leaves 
have been evaluated in total, we have the bound \Kj\ < (3 k ~ J+l . 
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We will bound the change in S using the relation between p roo t, Ptree, and p ca t described in the 



paragraph following Eq. 20 (as opposed to the bound used in Lemma C.l). This bound applies 
regardless of the order of evaluations, so we are free to reorder to group together evaluations 
descending from the same node in L. This allows us to use Lemma C.4: each of the nodes in Kj 



can change logS by at most Ac^-xj (2ck~ij from p r0 ot(a, 0) and 2cfe_ij from p roo t(a,l)). Since 
there are at most /3 l ~ J+1 nodes in Kj, the nodes in Kj will change log S by at most 4/3'~- ?+1 Cfc_i J '. 
Summing over the Kj in all (/3 + 1) intervals, we have, 



AlogS<4(/? + l) 

x (c k -i,k-if + c fc _ lifc _ 2 /3 3 + • • • + c k _ ltl f3 k+1 ) (29) 
Combining with the terms from the division process and subtree exception, we obtain 

A log S < 2l °p + log(4/3 3 ) - log Fj 

j 

+ 4(/3 + l)(c fc _i fc _i0 2 + • • • + c fc _ 1)1 ^+ 1 ) (30) 



Since this is an induction step, we can plug in values for Ck-ij from Lemma B.l, which, using the 
asymptotic definition, gives 

A log S < + log(4/3 3 ) - log Fj (31) 

j 

Finally, using the fact that initially S = 2n, we find 5 > (UjFjfi 3 /^- 3 ). 

For the case I < k by similar reasoning as above, we can reorder to put the evaluations of leaves 



from the subtree with the one possible exception first. Below in Lemma C.7, we show that a Type 
III exception cannot occur for I < k, so we don't need to worry about the extra decrease from the 
bias ratio. But the one exception gives us knowledge of only one node a in L. Suppose a = 1. 
Then all trees t with a = 1 will have ptree(t) = 2 and all trees t with a = will have ptree(t) = 0. 
Since each p ca t involves an equally weighted average of of trees t with a = and a = 1, each p ca t 
will be set to 1 when a is learned, so S is unchanged when a is learned and remains 2n. Otherwise, 
we still have the reduction in S due to the Kj sets as above, except with now only one interval, so 
we obtain: 

A log S < 4(c fc _ ljJ /3 + • • • + Cfc-i,!^) (32) 

By induction we can plug in values for c^-ij, and using the asymptotic relation, we find at the end 
of the computation, S > 2n □ 

Lemma C.7. Type III exceptions can only occur when I = k. 

Proof. We will show that when I < k, the bias ratio cannot be smaller than 2//3 3 . Suppose / < k, 
and we are concerned that a Type III exception might happen at a subtree rooted at a G L, where g 
is the root, due to measuring a leaf b. Let q' r be the probability that v (a) = r based on all evaluated 
leaves descending from g. We will show q' r is close to 1/2 before b is evaluated, and therefore that 
the bias ratio is close to 1. 
We can write: 

% = N W tPtree(t) 
t:v(t,a)=0 

= N ^ w t JJ p r0 at(a',v(t,a')), (33) 

t:v(t,a)=0 a'eL 
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C.4 to all nodes 



where N is a normalization factor to ensure q' + q[ = 1, and w% is the probability associated with 
t in the distribution T\- q[ can be written similarly. Note there can be no exception giving with 
high probability the value of any other a' £ L before b is measured, as if there were, we couldn't 
have a Type III exception at a, since the exception at a would simply become a Type I exception. 
Also, since I < k, less than j5 k ~ 1 leaves have been evaluated, we can apply Lemma 
a' G L. 

Using the sets Kj from Lemma 
leaves evaluated is less than f3 l ~i +l . 
most 1 + Cfc_i and at least 1 — c&. 
Proot(a',r) = 1. 

Then 



C.6 we have that the number of nodes in L that have /3 J 
Each of those nodes can have p roa t(a' r) that is at most at 
\j. If any node a' G L has not had any evaluations, then 



q' <N V 



t:v(t,a)- 



Wtf[(l + c k ^jf-" +1 <N/2, 



(34) 



and 



- Cfe-1,7 

t:v(t,a)=0 j=l 



pl-i+i 



N/2, 



(35) 



So up to negligible factors, q' = N/2. The same reasoning holds for q[, and using the normalization 
q*i + q' = 1, we have g = = 1/2. Then the bias ratio is (from Eq. 26) 



Ppartial(g) _ Pb\OQo + Pb\Wl _ 
Ppartial(a) (Pb\0 + Pb|l)/ 2 



up to negligible factors. 



(36) 

□ 



C.2.4 Bounding D 

To prove the lower bound on D, we reorder so that evaluations of leaves descending from a single 
node in L are grouped together (there are no intervals this time) . We are allowed to do this because 
D only depends on the final set of leaves that are evaluated, so if we reorder and find a bound on 
D with the new ordering, it must also be a bound on any other ordering. Furthermore, none of the 
Lemmas we use in proving the bound on D, or the probability of exception when determining D 
depend on ordering, so we can reorder freely. 

We put evaluations for leaves descending from the at most (3 sure nodes in L first (the sure 
nodes are those obtained either by the one exception on a subtree or by being given the value by 
the algorithm after f3 k ~ l nodes of a subtree are evaluated). We call Phase I the evaluation of leaves 
descending from those sure nodes, and we call Phase II the evaluation of leaves after Phase I. Phase 
I evaluations exactly mimic the k = 1 case for bounding D, where the last evaluation to each node 
a G L can be thought of as a query of that node. Thus in Phase I, the evolution of p C at(h t) follows 
the k = 1 case, except for the extra factors Fj described above. 

From this point, we will only discuss what happens in Phase II. Phase II evaluations don't lead 
to high confidence knowledge of nodes in L, but we still need to bound how much confidence can be 
obtained. For each a G L whose leaves are evaluated in Phase II, let p no de\i,r( a ) be the probability 
that a tree remaining in Tk,r,i will have v(a) = 0. Analogously to the k = 1 case, at the beginning 
of Phase II, for each a there is a height h that is the distance between a and sure nodes, such that 

Pnode\iA a ) = V 2 for * > K and p nod e\iA a ) = i ' X } for i + n < h - If Pnode\i,r( a ) = {°> X }' il W 011 '* 
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change with further evaluations. However, we must bound how much p no de\i,r( a ) can change if at 
the beginning of Phase II it is equal to 1/2. 

First we will bound how much p no de\i,r( a ) can change due to evaluations on other nodes a' G L 



during Phase II, where p no de\i,r( a ) is initially 1/2. As in Lemma C.6, we group a' into the sets 
Kj. For any tree t £ 71, and a' £ Kj, then Alogpt ree (i) < 2c k -ij by Lemma C.4 Since we are 
considering the case where p no de\i,r( a ) = 1/2 initially, half of the trees t with category i and root r 
have a = and half have a = 1. Since each a' can change ptree by at most 1 ± 2ck-ij, the worst 
case is when all for all trees t with a = 0, their probabilities change by l + 2ck-ij, and all trees with 
a = 1 change by 1 — 2ck-ij, for each a' € Kj. Thus we have A logp node |j r (6) < 4cfc_ij. Summing 
over the sets Kj, we obtain 

A logp node \ i>r (a) < 4(c fe _i j/ /3 + c fc -i,z_i/3 2 H h c^i/^) (37) 

where / is strictly less than k, since we are in Phase II. This means 

\Pnode\iA a ) ~ ~ 4 ( £ *-l,^ + Cfc-l,J-l/3 2 + ' ' ' + Cfc-1,1^)- (38) 

We will use this result in calculating D. 

To compute D, we need to track the evolution of p C at(h r ) in Phase II. We will show that we 
can do this in terms of Pnode\i,ri.^) &nd Proot 

(a, r). Let p ca t(h r)% be the value of p ca t(h r) before any 
leaves descending from a are evaluated, and let Pcat(hf)f be the value of p ca t (i,r) after all leaves 
descending from a are evaluated. Then 



Pcat(h r)f =Pcat(h r)i(Pnode\i,r(a)Proot(a, 0) 

+ (1 ~ Pnode\i,r( a ))Proot(a, 1), (39) 

where p rO ot(ci,0), p roo t(a,l), and p no de\i,r( a ) are the values after all leaves descending from a are 
evaluated. By induction, we have \proot(a,r) — 1| < c^-ij if j leaves of a are evaluated. 
Now we will see how p C at(h r) is updated in the three cases: 

• [i + no < h] Here we know p no de\i,o( a ) an d Pnode\i,i( a ) are either both or both 1. Then by 



Eq. ( |39| ), and using the bound on p roo t((i, r) from Eq. (21 ), p ca t{h r ) W1 ll be multiplied by at 
most 1 + Cfc_ij. 



|i > h] Now there can be a difference between p no de\i,o( a ) an d Pnode\i,i( a ), but this difference 



in bounded by Eq. (38). We have the bound on p roo t{a,r) from Eq. (21). Plugging into Eq. 
(39), and using the fact that Pcat(i, f)i < 1, we obtain p ca t{h v) / as multiplying p ca t(i, f)i by at 

most l + Ck-ij, and then adding a quantity 7 < 16c k ~i t j(c k -i t i(3 + c k -i,i-i(3 2 H \-c k -i,i(3 l ). 

Since the added term is from a single i, the addition to D from all z's is < hl6ck—ij{ck—i iP + 
c^i-xfi 1 + • • • + Cfc_i,i/3') 

[i < h < i + no] Without knowing the details of the direct function being evaluated by the 
tree, we have no knowledge of p no de\i,r 

(a). However, \p roo t(a,0) Proot («> 1)1 ^ 2 Cfe-ij, so 



plugging into Eq. (39) and using the fact that p C at(i,r)i < 1 and p no de|j,r( a ) < 1> we g e t that 
this step adds a quantity 7 < 2nQC k -\j to 

Now we need to sum these contributions for all a € L in Phase II. This summation simply 
expands each of the c k -ij in the above cases into (c k -ij(3 + + • • • + c k -\^f3 l ). Putting 
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the additions before the multiplications, if Di is D at the beginning of Phase II, and Df is D at 
the end of Phase II, we have 



D f <[Di + 16n(c fc -i,z/3 + c fc _ M _x/3 2 + 



+ c fc _i,!/3') 2 



+ 2n (c k -i,iP + Ck^i^p 2 + ■■■ + 

x (1 + {c k -u/3 + c fc _i^_i/3 2 + • • • + c fc _i,i/3 z )) 



(40) 



Using our inductive assumption to plug in for c^-i^i-i from Lemma |B.l we find that each term 
Ck-i y i-ij3 1+1 < 8nofi _3//5 /3 7 . So, aside from Di, the largest term in the equation scales like n -1 / 5 , 
which goes to for large n. 

For the case k = I, from the k = 1 case, we have Di < UjFjno/3. (The -Fj comes from 



renormalizing p roo t in Eq. 20). So .Dj dominates the sum, (since the Fj are all larger than 1), and 



thus Df < TljFjnofi. We obtain, using our bound from Section C.2.3 



D 2n /3 4 

Ck,k S g „ ,3/5 • 



(41) 



For the case I < k, we can get at most 1 sure node in L from one exception in a subtree. In that 
case, Phase I can never get started, so we go straight into Phase II. Thus Di = 0, and the above 
sum is dominated by the terms 16n(cfc_i^/3) 2 and 2riQCk-i t if3- From our bound on S from Section 



C.2.3, we have 5* > In. Combining these two bounds, we obtain 



Ck,l < 



ino \k-l+l h -3(k-l+l)/5 04+6(k-J) _ 



(42) 



This proves the c^i bound of Lemma B.l 
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